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1. INTRODUCTION 


In 1950, Lanczos [14] proposed a method for successive reduction of a given, in general 
non-Hermitian, NxN matrix A to tridiagonal form. More precisely, the Lanczos procedure 

generates a sequence H^ n \ n = 1,2, . . of tridiagonal n x n matrices which, in a certain 
sense, approximate A. Furthermore, in exact arithmetic and if no breakdown occurs, the 

Lanczos method terminates after at most n (< N) steps with H ^ a tridiagonal matrix 

which represents the restriction of A or A T to an A-invariant or A^-invariant subspace 

of C N , respectively. In particular, all eigenvalues of are also eigenvalues of A, and, 

in addition, the method also produces basis vectors for the A-invariant or A^-invariant 
subspace found. 

In the Lanczos process, the matrix A itself is never modified and it appears only in 

the form of matrix- vector products A • v and A T • w. Because of this feature, the method 
is especially attractive for sparse matrix computations. Indeed, in practice, the Lanczos 
process is mostly applied to large sparse matrices A, either for computing eigenvalues 
of A or in the form of the closely related biconjugate gradient (BCG) algorithm [15] 
for solving linear systems Ax = b. For large A, the finite termination property is of 
no practical importance and the Lanczos method is used as a purely iterative procedure. 

Typically, the spectrum of offers good approximations to some of the eigenvalues of 
A after already relatively few iterations, t.e. for n •< N. Similarly, BCG — especially if 
used in conjunction with preconditioning — often converges in relatively few iterations to 
the solution of Ax = b. 

Unfortunately, in the standard Lanczos method a breakdown — more precisely, di- 
vision by 0 may occur before am invariant subspace is found. In finite precision arith- 
metic, such exact breakdowns are very unlikely; however, near-breakdowns may occur 
which lead to numerical instabilities in subsequent iterations. The possibility of break- 
downs has brought the nonsymmetric Lanczos process into discredit and has certainly 
prevented many people from using the algorithm on non-Hermitian matrices. Note that 
the symmetric Lanczos process for Hermitian matrices A is a special case of the general 
procedure in which the occurrence of breakdowns can be excluded. 

On the other hand, it is possible to modify the Lanczos process such that it skips over 
those iterations in which exact breakdown would occur in the standard method. This was 
already observed by Gragg [8, pp. 222-223] and, in the context of the partial realization 
problem, by Kung [13, Chapter IV] and Gragg and Lindquist [9]. However, a complete 
treatment of the modified Lanczos method and its intimate connection with orthogonal 
polynomials and Pade approximation was presented only recently, by Gutknecht [10, 11]. 
Clearly, in finite-precision arithmetic, a viable modified Lanczos process also needs to skip 
over near- breakdowns. Taylor [19] and Parlett, Taylor, and Liu [18], with their look-ahead 
Lanczos algorithm, were the first to propose such a practical procedure. However, in [19, 
18], the details of an actual implementation are worked out only for look-ahead steps of 
length 2. We will use the term look-ahead Lanczos method in a broader sense to denote 
extensions of the standard Lanczos process which skip over breakdowns. Finally, note that, 
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in addition to [10], there are several other recent papers dealing with various aspects of 
look-ahead Lanczos methods (see [1, 2, 5, 7, 12, 17]). 

The main purpose of this paper is to present a robust implementation, including 
FORTRAN code, of the look-ahead Lanczos method for general complex non-Hermitian 
matrices. Our intention was to develop an algorithm which can be used as a black box. 
In particular, the code can handle look-ahead steps of any length and is not restricted 
to steps of length 2. On many modern computer architectures, the computation of inner 
products of long vectors is a bottleneck. Therefore, one of our objectives was to minimize 
the number of inner products in our implementation of th e look-ahead Lanczos method. 
Indeed, the proposed algorithm requires the same number of inner products as the classical 
Lanczos process, as opposed to the look-ahead algorithm described in [19, 18], which always 
requires additional inner products. In particular, our implementation differs from the one 
in [19, 18] even for look-ahead steps of length 2. 

This paper consists of Part I and Part II. The outline of the Part I is as follows. In 
Section 2, we recall the standard nonsymmetric Lanczos method and its close relationship 
with orthogonal polynomials. Using this connection, we then describe the basic idea of 
the look-ahead versions of the Lanczos process. In Section 3, some further notation is 
introduced. In Section 4, we outline the sequential look-ahead algorithm, and in Section 5, 
we give details of its actual implementation. In Section 6, we sketch the block version of 
the look-ahead Lanczos method. In Section 7, we make some concluding remarks. 

In Part II [6] of the paper, we describe how the look-ahead Lanczos process can be 
used to compute approximate solutions to Ax = b, solutions which are defined by a quasi- 
minimal residual (QMR) property. We also show that BCG iterates — when they exist 
can be easily obtained from quantities generated by the QMR method. Moreover in 
Part II, we report numerical experiments with the sequential look-ahead Lanczos algorithm 
applied to nonsymmetric eigenvalue problems and to solving nonsymmetric linear systems. 
Finally, the actual codes for the sequential look-ahead algorithm and for the associated 
QMR algorithm appear in the Appendix to Part II. 

Throughout the paper, all vectors and matrices, unless otherwise stated, are assumed 
to be complex. As usual, M T — ( ) and = ( rnji ) denote the transpose and the 
conjugate transpose, respectively, of the matrix M = (m,j). The set of singular values 
of M is denoted by <r(M), with a m&x (M) and <T„un (M) the largest and smallest singular 

value of M , respectively. The vector norm ||x|| = \/x H x is always the Euclidean norm and 
Ml — 0 max(Af) denotes the corresponding matrix norm. Moreover, the notation 

K n (c,B ) := span{c, Be, 

is used for the nth Krylov subspace of generated by c € and the N x N matrix B. 

V n := (<&(A) = 7 o + 71 A -| bTnA” | To,Ti»---» 7 n € C) 

denotes the set of all complex polynomials of degree at most n. Finally, it is always 
assumed that .4 is a complex, in general non-Hermitian, N x N matrix. 
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2. BACKGROUND 


In this section, we briefly recall the classical nonsymmetric Lanczos method [14] and its 
close relationship with formally orthogonal polynomials (FOPs hereafter). Using this con- 
nection, we then describe the basic idea of the look-ahead Lanczos algorithm. 

Given two non-zero starting vectors dj 6C N and w\ £ C N , the standard nonsymmet- 
ric Lanczos method generates two sequences of vectors {vn}^=i and {tu n }£ =1 such that, 
for n = 1, . . . ,Z, 

span{u 1 ,u 2 ,--.,Un} = K n (v u A), 

(2.1) 

s lpan{wi,W2, . . . ,w n } = K n (xvi,A T ), 

and 

wjvj = diSij, with di ± 0, for all i, j = 1, . . . , n. (2.2) 

Here <5, y denotes the Kronecker delta. The actual construction of the vectors v n and w n is 
based on the three- term recurrences 


AVn fin^n— lj 

n>n+i = A T w n - a n w n - /? n tu n _x, 


(2.3) 


where 


n — 


d n 


Pn = 


dn 

d n - 


d n = Wn v n, 


are chosen to enforce (2.2). Note that, for n = 1, we set (3\ = 0 and u<j = u>o = 0 in (2.3). 
Also, letting 


^ (n) — [ y i u 2 t>n] and = [uq w 2 w n ] (2.4) 

denote the matrix whose columns are the first n of the vectors vj and Wj, respectively, and 
letting 

Oj /?2 0 ■ ■ ■ 0 

1 02 '• ’■ i 

0 0 
: '• Pn 

0 ■ • ■ 0 1 a n . 

denote the tridiagonal matrix containing the recurrence coefficients, we can rewrite (2.3) 
as 

AU(") = y(")tf(") + [0 ... o v n+l ], 

(2 5) 

A t W^ = W (n) H (n) + [0 ... 0 u>n+i]. 
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Moreover, the biorthogonality condition (2.2) reads 

(Ty<")) r r<"> = D(”> = diag(d 1 ,*,...,<i„). (2.6) 

Now, let L be the largest integer such that there exist vectors v n and w n , n = 1, . . . , L, 
satisfying (2.1) and (2.2). Note that L < N and that, in view of (2.3), L is the smallest 
integer such that 

^l+i^l+i = 0- (2-7) 

Moreover, let 

L r = L r (vi, A) := dim A) and Li = Li{w\, A T ) := dim7f/v(u? 1 , A T ) (2.S) 

denote the grade of v\ with respect to .4 and the grade of w\ with respect to A T , respectively 
(cf. [20, p. 37]). There are two essentially different cases for fulfilling the termination 
condition (2.7). The first case, referred to as regular termination , occurs when v L+1 = 0 
or W L+ 1 == 0* Clearly, if v L+1 = 0, then L = L r and the right Lanczos vectors uj, . . . , vi T 
span the A-invariant subspace Ki r (v\ , A). Similarly, if = 0, then the left Lanczos 

vectors w \ , . . . , wl, span the .A ^-invariant subspace Kl ( {wi,A t ). Unfortunately, it can 
also happen that the termination condition (2.7) is satisfied with v L+1 ^ 0 and w L+l ^0. 
This second case is referred to as serious breakdown [20, p. 389]. Note that, in this case, 

L < L* := min{L/,L r } 


and the Lanczos vectors span neither an ^-invariant nor an .^-invariant subspace of C^. 

It is the possibility of serious breakdowns, or, in finite precision arithmetic, of near- 
breakdowns , that has brought the classical nonsymmetric Lanczos algorithm into discredit. 
However, by means of a look-ahead procedure, it is possible to leap — except for the very 
special case of an incurable breakdown [19] — over those iterations in which the standard 
algorithm would break down. Next, using the intimate connection between the Lanczos 
process and FOPs, we describe the basic idea of the look-ahead Lanczos algorithm. 

First, note that 

Kn(v\,A) = 

K n (w u A T ) = {*(A r )ti>, | € Vn-i}. 

In particular, in view of (2.3), for n = 1 ,, , . ,L, 

v n = ® B _,(il)wi and w n = 'Z? n -i(A T )w 1 , 


(2.9) 


( 2 . 10 ) 
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where 'P n _i 6 V n -i is a uniquely defined monic polynomial. Then, introducing the inner 
product 

($,'!'):= ($(.4 7 ')t^ 1 ) :r (^'(>4)ui) = (2.11) 

and using (2.1), (2.9), and (2.10), we can rewrite the biorthogonality condition (2.2) in 
terms of polynomials: 

( , i'„_ 1 ,^) = 0 for all € "Pn -2 (2-12) 

and 

($„_ !,$„_!) ^0. (2.13) 

Note that, except for the Hermitian case, i.e. A = A H and w\ = v\, the inner product 
(2.11) is indefinite. Therefore, in the general case, there exist polynomials $ ^ 0 with 
“length” ('P, ’£) = 0 or even (S', $) < 0. 

A polynomial $„_! € V n -i, ^ n -i ^ 0, that fulfills (2.12) is called a FOP (with 
respect to the inner product (2.11)) of degree n - 1 (see e.g. [3], [4], [10]). Note that the 
condition (2.12) is empty for n = 1, and hence any = To ^ 0 is a FOP of degree 0. 
From (2.12), 

$„_i(A) = 70 + 7i A H 1- 7„_iA n_1 

is a FOP of degree n — 1 if, and only if, its coefficients 70, . . • ,Tn-i are a nontrivial solution 
of the linear system 

mo mi 

mi 

m 2 


L m n _ 2 ••• 

Here 

m j ■= wjA’v 1 = (1, A- 7 ), j = 0,1,..., 

are the moments associated with (2.11). A FOP is called regular if it is uniquely 

determined by (2.12) up to a scalar, and it is said to be 3ingular otherwise. Remark that 
FOPs of degree 0 are always regular. By means of (2.14), one easily verifies that a regular 
FOP 'F n — 1 has maximal degree n — 1. In particular, a regular FOP is unique if it is required 
to be monic. Moreover, singular FOPs occur if, and only if, the corresponding linear system 
(2.14) has a singular coefficient matrix, but is consistent. If (2.14) is inconsistent, then 
no FOP 'Fn — 1 exists. This case is referred to as deficient. By relaxing (2.12) slightly, one 
can define so-called deficient FOPs (see [10] for details). Simple examples (see Section 13) 
show that the singular and deficient cases do indeed occur. Thus, regular FOPs ^ n _i need 
not exist for every degree n — 1. We would like to stress that this phenomenon is due to the 
indefiniteness of (2.11). For a positive definite inner product (•, •), unique monic formally 
orthogonal polynomials always exist. 

Finally, given a regular FOP $ 71 - 1 , it is easily checked whether a regular FOP of 
degree n exists. Indeed, using (2.14), one readily obtains the following 


m 2 


m „— 2 


m 2„— 5 
rr ^2n-5 m 2n-4 



' To ‘ 


■ m n _! ■ 


7i 


m n 


72 

= 7n— 1 

m n + 1 


-7n— 2- 


-m 2 „- 3. 


(2.14) 
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Lemma. Let 'I'n-i be a regular FOP ( with respect to the inner product (2.11)) of degree 
n — 1. Then, a regular FOP of degree n exists if, and only if, (2.13) is satisfied. 

Let us return to the standard nonsymmetric Lanczos process (2.3). Using (2.7), (2.10), 
(2.11), and the above lemma, we conclude that the termination index L is the smallest 
integer L for which there exists no regular FOP of degree L. In particular, a serious 
breakdown occurs if, and only if, no regular FOP exists for some L < L+. 

On the other hand, there is a maximal subset 

{ni,n 2 ,...,nj} C (1,2, . . . n x := 1 < n 2 < • • • < n j < X*, (2.15) 

such that, for each j = 1, 2, . . . , J, there exists a monic regular FOP Sf'nj -i € V nj _ x . Note 

that n\ = 1 in (2.15) since ’Fo(^) = 1 is a monic regular FOP of degree 0. It is well- 
known that three successive regular FOPs <P n ^ n> _ x , and \P n . +i _ 1 are connected via 

a three-term recurrence. Consequently, setting, in analogy to (2.10), 

Vnj — *f! ni -\(A)vx and w nj = '5'„ i _ 1 (A :r )u; x , (2.16) 

we obtain two sequences of vectors }j—i and {u> nj . }y =1 which can be computed by 

means of three-term recurrences. These vectors will be called Tegular vectors, since they 
correspond to regular FOPs; the starting vectors u x and u> x are always regular. The look- 
ahead Lanczos procedure is an extension of the classical nonsy mm etric Lanczos algorithm; 
in exact arithmetic, it generates the vectors v nj and w nj , j — 1, . . . , J. If n j = 

in (2.15), then these vectors can be complemented to a basis for an A- invariant or A T - 

invariant subspace of C^. An incurable breakdown occurs if, and only if, nj < L* in 
(2.15). Finally, note that 


wl.v = w T v ni =0 for all v € K nj -i(v lt A), w € K nj - l (w l ,A T ), 

j = 1, . . . , *7. 


(2.17) 


The look-ahead procedure we have sketched so far only skips over exact breakdowns. It 
yields what is called the nongeneric Lanczos algorithm in [10]. Of course, in finite precision 
arithmetic, the look-ahead Lanczos algorithm also needs to leap over near-breakdowns. 
Roughly speaking, a robust implementation should attempt to generate only the “well- 
defined” regular vectors. In practice, then, one aims at generating two sequences of vectors 

)f«i and {">.;,)£=! where 

£ {">}/-!. Ji ~ 1. (2.18) 

is a suitable subset of (2.15). Note that, in (2.18), we set j\ = 1 , since v\ and w\ are always 
regular. The problem of how to determine the set (2.18) of indices of the “well-defined” 
regular vectors will be addressed in detail in Section 4. 
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In order to obtain complete bases for the subspaces K n (vi,A) and K n (wi, A T ), we 
need to add vectors 


v n E K n (vi, A) \ K n -i(vi,A) and w n € K n {w\, A T ) \ K n -i(wi, A T ), 

(2.19) 

^ ^ jk - 1 “b Ij • • • » ™jk Ij ^ — 2, 3, , K , 

to the two sequences }JL l and respectively. Clearly, (2.19) guarantees 

that (2.1) remains valid for the look-ahead Lanczos algorithm. The vectors in (2.19) axe 
called inner vectors. Moreover, for each k, the vectors v n , n = n JJk , nj k + 1, . . . , n Jt+l — 1, 

and correspondingly for w n , are referred to as the A'th block. The inner vectors of a block 
built because of an exact breakdown correspond to singular or deficient FOPs, while the 
inner vectors of a block built because of a near- breakdown correspond to polynomials which 
in general are combinations of regular, singular, and deficient FOPs. We will refer to both 
the regular and the inner vectors v n and w n generated by the look-ahead variant as right 
and left Lanczos vectors, in analogy to the notation of the standard nonsymmetric Lanczos 
algorithm. 

So far, we have not specified how to actually construct the inner vectors. The point is 
that the inner vectors can be chosen such that the u„’s and u> n ’s from blocks corresponding 

to different indices k are still biorthogonal to each other. More precisely, with and 
W (n) defined as in (2.4), we have, in analogy to (2.6), 

(W ( " ) ) 7 V ( " ) * £(">, n = rij k — 1, * = 2,3,...,tf. (2.20) 

Here, is now a nonsingular block diagonal matrix with k — 1 blocks of respective 
size (rij l+l — rij,) x (n JI+t - rij,), l = 1, . . . , k — 1. Similarly, (2.5) holds, for n = rij k - 1, 

k = 2,3, . . . ,K, with Ff (n) (cf. (3.5)) now a block tridiagonal matrix with diagonal blocks 
of size (n il+l - n jt ) x (n i(+1 — n it ), l = 1, . . . , k - 1. 

There are two fundamentally different approaches for constructing inner vectors with 
the property (2.20). In both cases, we generate the inner vectors using a simple three-term 
recurrence. However, in the first approach, each inner vector in a block is biorthogonalized 
against the previous block as soon as it is constructed. This variant will be called the 
sequential algorithm. In the second approach, the entire block is constructed before it is 
biorthogonalized against the previous block and, possibly, depending on the size of the 
current block, against vectors from blocks further back. This variant will be called the 
block algorithm. The sequential algorithm is more suitable for a serial computer, while the 
block algorithm is more suitable for a parallel computer. In this paper, we will focus on the 
sequential algorithm and its implementation, and we will only sketch the block algorithm. 

Finally, two more notes. First, the inner product (2.11) could have been defined as 
($, tf) := ($(A h ) Wl ) H ($(A ) Vl ) = w^(A)^(A) Vl 
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and the algorithms can be formulated equally well in either terms. We use the transpose 
because it simplifies the formulas. Second, in the rest of the paper, we will use the notation 
n k ■= Tij k for the indices of the “well-defined” regular vectors. However, notice that there 
is no guarantee that the indices n* generated by the look-ahead Lanczos algorithm in finite 
precision arithmetic actually satisfy (2.18). 
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3. NOTATION 

In this section, we introduce some further notation. 

We will use the following indices: 

• n = 1,2,... are the indices of the Lanczos vectors v n and tw„; 

• l = 1, 2, ... is used els a counter for the blocks; 

• ni, / = 1,2, . . ni := 1, are the indices of the computed regular vectors; note that n/ 
is also the index of the vector at the beginning of the /th block; 

• hi := — nj, / = 1, 2, . . ., is the size of the /th block; 

• For given n, k = k(n) is the number of the block which contains the Lanczos vectors 

u„ and u>„; note that n* is the index of the last computed regular vector with index 
< n; 

• v and fj. are 0-based indices used to count inside a block; 

• b ji 3-n.d m are general purpose indices. 

For reasons of stability, we will compute scaled versions of the right and left Lanczos 
vectors, rather than the “monic” vectors v n and w n (cf. (2.16)) corresponding to monic 
FOPs. A proven choice (see [18], [19]) is to scale the Lanczos vectors to have unit length. 
By v n and w n we denote the scaled versions defined by 

v n = s n v n and w n = t n w n , s n := ||u„|| ,t n := ||u>„|| . (3.1) 

The scale factors s n and t n in (3.1) can easily reach the overflow or underflow limits; hence, 

instead of storing them directly, we store - , and and we never compute s n 

or /„ directly. Furthermore, in order to save work, the vectors will not actually be scaled 
at every step. Instead, we store vectors v n and u>„, with scale factors a n and £ n , such that 

Vn = C n Vn and Wn = U^n, <T n , > 0, (3.2) 

and we actually carry out the scaling only when <x n or £ n approach the overflow or underflow 
limits. (The scale factors er n are not to be confused with the singular values < 7 m j n and cr max .) 

We identify blocks by their number /. Capital letters with subscript / denote matrices 
whose columns are all the vectors from block /. For example, 

Vi = [v n , U „,+1 ... u n , +l _!] and W { = [ w n , ti> n , +1 ... uj n(+1 _! ] 

are the matrices containing the “monic” right Lanczos vectors and the scaled left Lanczos 
vectors corresponding to block /, respectively. By 5/ and Tj, we denote the diagonal 
matrices whose diagonal entries are the scaling factors, as defined in (3.1), corresponding 
to block /. Note that 

V, = V,S t and Wi = W t Ti. 
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Similarly, E/ and n/ denote the diagonal scaling matrices containing the scalar factors from 
(3.2) corresponding to block /, and then 

Vi = V,E, and W, = W,E,. 

Capital letters with superscripts indicate matrices which contain quantities from 
all previous iterations up to step n. For example, in addition to (2.4), we denote by 

and the matrices whose columns are the first n scaled right and left Lanczos 

vectors, respectively; similarly, and axe the diagonal matrices containing the 
corresponding scaling factors from (3.1). In view of ( 2 . 5 ) and (2.20), we then have, for 
n = 1 , 2 ,..., 


AV {n) = V^)s^H {n \S {n) )~ l +[0 0 

A T W (n) = W (n) T (n) H (n \T (n) )- 1 +{ 0 0 ^w n+1 ], 


and 


Here 




f<*l /? 2 0 


H (n) := 


72 a 2 

o 


0 ' 

0 



Pk 

0 7 * <*k. 


is a n x n block tridiagonal matrix with blocks of the form 



^ • • • • J$c” 


■0 0 1- 


1 : 


: 0 

a/ = 

0 ; 

1 7/ = 

• ■ > 
• • , 
• . • 
• * • 


0 

0 

* 


.0 0 . 


(3.3) 


(3.4) 


(3.5) 


(3.6) 


The blocks are in general full matrices. Notice that is an upper Hessenberg matrix. 

For l < k := k(n) the matrices a /, /3j, and 7 / are of size hfXhi , hi- 1 x h/, and hi x hi- j, 
respectively. The matrices a*, and 7 * corresponding to the current, i.e. kth, block are 

of size hk x hk, hk-i x hk, and hk x hk- 1 , respectively, where hk := n + 1 — n*. Notice 
that in general the fcth block is not a complete block; it is complete if, and only if, n + 1 
is the index of the next computed regular vector. In ( 3 . 4 ), the matrix 


£)<"> = diag(Hf V 1 ,T^,...,W* 7 V l ) 


(3.7) 
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is block diagonal with nonsingular blocks W?V }, l = 1,2, . . . ,Jfc - 1. Its last block W?V k , 
and hence itself, is nonsingular if the kth block is complete. 

Furthermore, the following notation will be used. We set 

H {n) := S {n) H {n \S {n) )~ l and Mi := (Wj r Vi)~ 1 Wj r . 

Generally, a (tilde), as e.g. in denotes intermediate quantities. A :tTn means the 

mth column of A, while Ai-j m means elements i through j of the mth column of A. 

We will assume that the vectors in a block are generated using a polynomial recursion 
of the form 

©m-i(*) =(z - C, u )Q„{z) - T7„0„_i(z), i/ = 0, 1,... , 

(3.8) 

©-i(*) = 0, © 0 ( 2 ) = 1, r ) o = 0. 

For instance, a practical choice for the polynomials in (3.8) are suitably scaled and trans- 
lated Chebyshev polynomials, so that the inner vectors axe generated by the Chebyshev 
iteration [16]. Finally, 0 j,(j 4) will be denoted by just 0„ whenever the meaning is clear 
from the context. 
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4. THE SEQUENTIAL ALGORITHM 


The sequential version of the algorithm biorthogonahzes each inner vector in a block against 
the vectors in the previous block as soon as the vector is constructed, and biorthogonalizes 
the regular vectors against the previous two blocks. 

Suppose we have already completed n steps of the algorithm. Hence, v n and w n axe 
the last generated Lanczos vectors and k = k(n ) is the index of the last block. If u n+I and 
^n+i are constructed as inner vectors, then they are given by 

5 "+l = ~ Cn-n k V n — 77n-n*Un-l, 

"'"+1 = *4 W n ~ Cn-n k W n - T} n - nk W n -i, 

Vn+1 = U n +1 ~ Vk-^W^Vk-l^W^Vn+i 

= u„+i - V k - 1 (WZ. 1 V k - 1 )- 1 Wi‘_ l Av n , 
w n+i = w n + 1 - W k - 1 (W?_ 1 V k - 1 )- 1 W?_ 1 v n+1 
= t5 B+1 - Wk^iWf^Vk-ir'W^Av*, 

or, in terms of scaled vectors, by 


^n+1 


^ n + 1 Av n Cn-n.U n ^n-n.^n-l 




K n+ 1 


W n+1 = A T W n - C n-n k Wn - ^^n-n^n-l 


tn- 


1 


* n 

7 — 1 SJ1 tT 


- fW^T^S^W^V^W^Avn 

l n 


If u n +i and w n +i are regular vectors, then they are given by 


u n +i = Av n , 
w n+ i = A T w n , 

Vn+x = v n+1 - V k - 1 (W!L 1 V k - 1 )- 1 W?_ l v n+1 - V k {W^V k )- l Wjv n+l 
= Vn+l - V k - l (Vffc l V k - l )- 1 WZL 1 Av n - V^WTVkr'WfAvn, 
w n+1 = u5 n+1 - ^._ 1 (^_ i n_ 1 )- 1 ^ r -ii5 n+ i - W k (W?V k r'W?v n+1 
= ^n +1 - - W k {WZV k )- l W?Av n , 
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or, in terms of scaled vectors, by 


5 n-f 1 ~ A * 

*>n+i = Av n 



(4.1a) 

-V t (W^V t r'WfAv„, 

(4.16) 

^n + 1 ~ a t * 

w n+l = A W n 


- 5r-i(wf-i u-, )-* w^Ai n 

L n 

(4.2a) 

- ~W k T k S^( Wjf V k ) - 1 \V£ Av n . 

l n 

(4.26) 


Here we used the fact that at step n, the inner vectors v n and (when appropriate) u n _j 
are already biorthogonal to the previous block W k -\ . Note that using the recursion to 
compute u n+: and w n+i in the case of the regular vectors is redundant, since the regular 
vectors are then biorthogonalized against the vectors in block k , which includes the vectors 
involved in the recursion. 

If u n +i and Wn+i are inner vectors, the size of the current incomplete block k is 
increased by 1; if they are regular vectors, then the £th block is complete, and a new 
block, the ( k + l)st is started with u n+1 and u> n +i as its first vectors. The decision on 
whether to construct u n+1 and u; n+1 as inner or as regular vectors is based on three different 
criteria, see (4.10-4.12). If at least one is satisfied, then u n +i and u> n+1 are constructed as 
inner vectors, otherwise, they are constructed as regular vectors. Next, we motivate the 
three criteria. 

First, recall (cf. (3.7)) that t> n+1 and u?„+i Eire regular vectors if, and only if, WfV k is 
nonsingular. Therefore, we check whether this matrix is singular or close to singular. The 

singular value decomposition (SVD) of Wj[V k is computed, and an inner step is performed 
if 

^min(H^jt'kfc) < tol. (4.3) 

Here tol is a suitably chosen tolerance. For example, Parlett [17] suggests tol = e 1 / 4 or 

tol = e 1 / 3 , where e denotes the roundoff unit. In view of (4.3), it is guaranteed that 
complete blocks of constructed Lanczos vectors satisfy 


^min(I^/ Vj) ^ tol , / — 1,2,... . 

Note that, by [17, Theorem 10.1], (4.4) together with (3.4) and (3.7) imply 

^min(V (n) ) > -^ and flr min (^ (n> ) > n = nj - 1, / = 1, 2, . . . 

v n v n 


(4.4) 


(4.5) 
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Furthermore, for the vectors corresponding to each block, we have 


<WV/) > -j= and <r min (W,) > -~=, 7*1,2,.... 

Remark that the columns of T/( n > and W < n > are unit vectors and that <r min (V r ) respectively 

<^min(^) is a measure of the linear independence of these vectors. In particular, (4.5) 
ensures that the Lanczos vectors remain linearly independent. 

In the outlined algorithm, the block biorthogonality (3.4) and (3.7) is enforced only 
between two or three successive blocks. Unfortunately, in finite precision arithmetic, 
biorthogonality of blocks whose indices are far apart is typically lost. Consequently, in 
practice, (4.5) is no longer guaranteed to hold and thus (4.4) alone does not ensure that 
the computed Lanczos vectors are sufficiently linearly independent. Indeed, numerical tests 
confirmed that, if the look-ahead strategy is based only on criterion (4.3), the algorithm 
may produce within a block Lanczos vectors which are almost linearly dependent. When 
this happens, the algorithm never completes the current block, i.e. it has generated an 
“artificial” incurable breakdown. 

The situation just described occurs if, roughly speaking, a regular vector u„+i is com- 
puted whose component Av n e K n +i(vi, A) is dominated by its component in the previous 
Krylov space K n (v \,A) (and similarly for u> n +i). In order to avoid the construction of 

such regular vectors, we check the la -norm of the coefficients for Vit-j in (4.1a) and V k in 
(4.1b) and compute u n +i as a regular vector only if 

n t -l 

£ I ((»'*’-, v*_ , )- 1 , At>„ jy | < fac-\\A\\ (4.6) 


and 

n 

£ |((W'fU)- 1 Wf M,)i| < fac ■ IMII . (4.7) 

j=n k 


Here fac is a suitably chosen factor. In analogy, by (4.2a) and (4.2b), w n+1 is constructed 
as a regular vector only if 


77 £ 


(4.8) 




and 


t £ ^j((^r , W i T Av„) } j</ac-IIAII. 

n j=n k J 


(4.9) 
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By combining (4.7) and (4.9), we arrive at the criterion (4.11) given below for performing 
an inner step. Notice that (4.6) and (4.8) involve quantities from the previous block k — 1. 
Hence, if (4.6) or (4.8) were violated, one would need to go back and construct the previous 
regular vectors v nk and w nic and the Arth block differently. In order to avoid this, we check 

for (4.6) and (4.8) while building block k — 1, which results in criterion (4.12) below for 
performing an inner step. 

To summarize, we next describe one step of the sequential algorithm. 

DESCRIPTION OF ONE STEP OF THE SEQUENTIAL ALGORITHM: 


Given u>„, n„, fj, k = h(n), (W?V „), and 


Compute u n+1 = Av n , ui n+1 = A T w n . 
Compute 


U n+1 - u n+l - — Vk-lZk-lH ni '_ :nk _ ljn , 

din+1 = dj n+1 — —-^-Wk-l^k-lTk-lS k * 1 Hk-l:n k -l,n- 

s n 


Compute w„v n +i. 

Compute the SVD of (W[V k ). 

Set 

inner = (a min (W k V k )j < tol. (4.10) 

If not inner, then 

Call BUILDHl to compute H nk;n , n = {W k V k )~ l W k Av n . 

Set 


inner = 




> fac ■ p|| . 


(4.11) 


If not inner, then 

Build u n +i and w n +i as regular vectors: 


•Sn-f 1 

S n 


^n+l^n+lj — &n ^n + 1 J~ n V k ^kS nk:n>n ^ , 
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t ^ 1 ({»+l^n+lj ~ £n (w n +l “ ^J^WkEkTkS k 1 Hn k -.n,n) • 


Call SCALE to scale the vectors. 
Compute • 

Call BUILDH2 to compute 

^n fc :n,n+l = (Wj[Vk)~ l Wj Av n +\. 

Set 


inner = 


max 


E * 


J>n - hi 


J=n k 


? 



> foe ■ MU • 


( 4 . 12 ) 


If inner, then 

Build u„+i and w n + 1 as inner vectors: 


^n+l ( _ o 

Sn ^n+lVn+lj 
Hn— l,n 

Hn,n 


— 0"n (^n+1 Cn-n*V n ~a r ^‘ T ln-n k V n -1 ) , 

= £„ (l5 n +i - Cn- ni Wn ~ 1u-n>*n-l) , 


— — 1 „ 

7 /n-n*? 

= Cn — n k • 


Call SCALE to scale the vectors. 
Compute u£ +1 v n+1 . 

Set Jt = &(n + 1) = Jfc(n). 


Call UPDATE to update (W?V k ). 

Call BUILDH2 to compute H nk i : nk - i, n+1 = (W?_ 1 V k -i)- 1 W?_ l Av n+1 . 

else 

Set ifn*:n,n -^n*:n,n 3^ld -ffn* :7i,n-+-l = ■®nfc:n,n+l- 
Set fc = fc(n + 1) = k(n ) + 1. 

Set {W k Vk) = 0 , r»+i^n+l^^ , +1 Un+l- 
Set H n+1 , n = 
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5. IMPLEMENTATION DETAILS 

In this section, we describe in detail the actual implementation of the routines BUILDHl, 
BUILDH2, SCALE, and UPDATE used in the sequential algorithm, as well as the 
procedure for estimating fac (cf. 4. 6-4. 9). Note that in the actual codes, these routines 
except for SCALE — do not appear explicitly; rather, they are coded inline, and 
appear only as logical blocks. 

5.1. BUILDHl 

BUILDHl takes as input (W?V k ) :<n _ nk+u (W?V k )-\ and <r n( £ n u£{5 n+1 , and returns 

:n,n* 

Consider a term of the form wjAv n , n k < j < n, n = n k+1 — 1. Let v = j — n k , 
H = n - n k . We distinguish two cases: 

(i) j = n. 

We compute the w^Av n term directly, since we do not know all the terms in the 
recurrence for either vector. 

(ii) n k <j <n- 1. 

Here, 

w J Av » k+l -i = ( Q >'U T )^n k ) T A(Q fl (A)v nk ) 

Q t/AQ nV nit 

= + ■n l/ Q v -x)Q tl v nk 

= ^n t 0^+l€)/iW nik + Qv®/iVn k +^ | 0„_ 1 0 M V nt 

= + (ywfvn + TlvwJ^Vn, 

all of which are terms which have already been computed as part of W k V k . Hence, we 
have: 


W?Av n = (W k T k l ) T Av n 

= T k T l w n k Av n ••• w^_ x Av n w^Av n ] T 
= T I r r [*‘* ( W j+1 + CuWj + T} u Wj-l) T V n ••• w^Av n ] T 

= T k T [--- U>J+lV n +C»wJvn +r} v wf_ 1 v n ••• w^Av n ] T 
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~ T k T ( • ' ‘ (tj+iwj+i + ( ‘vtjwj + Tivtj-lwJ.^Vn ■ 
= ... ^^wj +1 v n + („wjv n +T}„l^wJ_ 1 V n ••• 


where j = n k , . . ., n k+1 - 2. With this, = {WTV k )~ l WT Av n . 

term, we use: 


“»- 4C " = a n(Av „ - 


^ n (^n^n+1 ) 

= <7nZnWnVn+l 


because this formulation has better numerical properties. 


■ • t n w^Av n ] T 
W^Av n 

For the w 
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h c 


5.2. BUILDH2 


The logical block BUILDH2 takes as input <x n+1 , £„*, 7^*-, v n+1) and the last 

column of (W^ V*_i) -1 , and returns H Uk :nfc _ 1|n+1 . 

Consider a term of the form wj Av n +i, n k _ 1 < j < n k — 1, n k — 1 < n < n k+l — 2. 
Let v — j — n Jt _ 1 , n — n — n k . We distinguish two cases: 

(i) j = njt - 1. 


= wl Q v Av n+1 
= j40„t>n+l 

*-l 

= ^n»t>n+l. 

Here we used the fact that the polynomial for u n +i is orthogonal to the polynomials 
in blocks n k _ 1 and njt-2, which appear in the expression for the polynomial for w nit . 

(ii) ”jt_i <j<n k - 2. 

wjAv n+1 =(Q u (A T )w nk _ i ) T Av n+ 1 
= 0„Au n+1 

*-l 

= wl A0„u n+1 

= W n k ^©*+1 + + WvQ»-l)Vn+l 

= 0. 

Hence, we have: 

Bn k _ lt n h -l,n + 1 = #F-l^»+l 

= (WT_ 1 V k - 1 )-'(W k - 1 T k 2 l ) T Av n+1 
= (W?_ 1 V k - 

• V>*^AVn+l Wn k - 2 Av n +l M>J k _ 1 At) n+ i J T 
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(^.n-O-T^lO ... 0 <«„ +1 ] T 
(M^-n-o-Trilio ••• o (<„.<i,.,) T o„ + ,] 1 ' 
[o ••• O J^ T <0„ + ,] 1 ' 

Jo ••• 0 iJ.Sn+i 
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5.3. SCALE 


SCALE takes as input <r n , f„, £> n+1 , tZ> n+1 , f a , and returns u„+i, u>„+i, <r n +i, 6»+l> % tl , 

In 

, and • It computes the scale factors s„+i and t n +i so that u„+i and w n +i both 
are unit vectors. This gives: 


•Sn+l 


c n+l 


°n+lVn+l 

f \ 

£n+lU’n+l 


Pn+l||) ( 
^»||t 5 n+l||^ ( 


ll«~ 1 »ll 5 " +I ) ’ 

||l5n+l||“’" +1 ) ' 


The algorithm is as follows: 

5 = l|Vn+l||, 
t = ll^n+x||. 

If either s • <r n or t • is small, we have found an invariant subspace. 

_ _ 1 
< 7 n+l - J, 

Zn+1 = J- 

If cr n +i is too small or too large, then 
^n+l = ^n+lUn+l, 

^n+l = 1. 

II fn+i is too small or too large, then 
Wn+l = fn+ld>n+l> 
fn+1 = 1, 

^ -£.|w»+i||, 

^n+1 _ -Sn-M t n S n 

tn+1 s n t n +l Tn' 

v n + 1 = ^n+1 , 

w n+ l = lOn+1- 
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5.4. UPDATE 


UPDATE takes as input & n +i, Cn+ 1 , w^+iVn+i, and (W^V*). It appends a new row and 

column to the matrix (WjfV It). First note that (WjfVk) is symmetric, hence only its upper 
triangular part has to be constructed. Let W{ and vj be two vectors from the current block, 
and v = i — rik, fi = j — rik be the corresponding block indices. 

Then, 


/ 7 1 

w T v j = \Qv(A T )w nk J © M (A)u njt 

= - C,, — !©,, — ! - ^-10 m -2K 4 

— W n k ®i'AQ tl -iV nk — I^J k (C^-10v0^_l + 7 ?^-10|/0^-2)Un* 

~ W n k AO„Q ft -iV nk — tXJ^’ jk (^_ 1 @ v 0 /1 _ 1 + 77^-l0^0^_2)u n * 

= ^n*(0</+ 1 + C*0„ + 

~ U; n»(CM-10*'0^-l + ^-10^0^-2)Un* 

= “nt 0«/+l ©^-1 V nt + Ci/ u, n fc ®»'0A*-l t; nt + 7 ?*/ u ’n*©t'-10/i-l u n* 

- ~ Vft-l w n k ^v^li-2 v n k 

— “'n* (©*/+! ©,»-l - ^-l©^0^-2)Wn* + (Cl/ - CM-l) u, n*©*'0»»-lVn* 

+ Vn^n k ©^-l ©M-l^n* • 

Jl _ 

= U, »+1 V J-1 “ T lj-n k -lwfVj -2 + (C«-n* ~ Cj_n* -1 V^'-l 

+ T l}-n k wf_ 1 Vj-i. 

This shows that an element on the mth diagonal can be computed from elements on 
the previous two diagonals and previous elements on the mth diagonal, leaving only the 
main diagonal and the first superdiagonal to be computed by inner products. Hence, the 

complete hk x hk matrix (WjV*) corresponding to the complete fcth block can be built 
with only 2hk — 1 inner products. 

To update (W^V*), we then have: 

a a 0 rp 

^n+l v n+l = ^n-1-1 1 


- T 
W J n 


^n+1 j ^n+1 — £ n — nfc U n — 


— 1 d'n — 1 _ 
S n <7 n V* 


-nkVn-l'j'j 


22 


^T^n+l 


^"Sn+l (^n^n+1 Cn-n* wjC 1 ’* ~ ^ 1 £ n 1 *1n-n k U>^U n -l ^ 

— ^^"Vi ^n^n+1 — Sn ^j Cn-n**«r^n “ 3„"i i,,' ’/"-"k^n-l, 

= J^ W ?Vn+l 

= « n +it,- (^H-l^n - TJn-n.wJ'vn-! + (Ci-n* - Cn-nJ^n) 

+ 5^T?7 (^»-«*+l W ^-l»«) 

= Sn+j*,- (sn*,+ lwT+lVn ~ Vn-n„ S n - 1 t i W?V n - 1 ') 

5 n -)-i<j ((Ci— n* Cn — nk) s nti‘&fv n + f]n— n* + l 5 n ii — 1 ^>JL i 

_ 5 n ^«-M ~T ~ 5r> — 1 ^ T' /s 

- S^‘% W i+ 1®» “ s^»?n-n*U;?\>„-l 

+ 3^T(Ci-n t ~ Cn-nJ^n + ^"N n-nt+l^Un, 


« T 

^n+lUi 


= 3^T u ’»+i“' 

= 3^“^”+' 

= £ ¥ 1 rf; 1 t ". T ' c "+i. 


where i = n*, . . ., n. 
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5.5 Estimating fac 


Recall that the checks (4. 6-4. 9) are used to ensure that the Lanczos vectors axe suffi- 
ciently linearly independent to avoid artificial incurable breakdowns. However, numerical 
experience with matrices whose norm is known indicates that setting fac = 1 is too strict 
and results in artificial incurable breakdowns. A better setting seems to be fac = 10, but 
even this is dependent on the matrix. In addition to estimating /ac, in practice one is 
faced with the problem of estimating the matrix norm as well. This problem becomes even 
more complicated when solving linear systems, because one usually replaces the original 
system by a preconditioned one. Finally, in practice there is also the issue of a maximal 
block size, which is a user-specified value related to limits on available storage. To solve 
the problems of estimating norms and a suitable factor /ac, as well as coping with lim- 
ited storage and yet allowing the algorithm to proceed as far as possible, we propose the 
following procedure. 

Suppose we arbitrarily set ||A|| = 1, where A denotes the matrix actually used in gen- 
erating the Lanczos vectors, thus including the case when we are solving a preconditioned 
linear system. Then we are left with estimating just /ac, which is done dynamically. In 
each block, whenever an inner vector is built due to (4.11) or (4.12), the algorithm keeps 
track of the size of the terms that have caused (4.11) or (4.12) to be true. If the block 
closes, then this information is discarded. If, however, the algorithm is about to run out 
of storage, then fac is replaced with the smelliest value which has caused an inner vector 
to be built, and the block is rebuilt. This time, the updated value of fac is guaranteed 
to pass at least once the checks in (4.11) and (4.12), and hence the block is guaranteed to 
dose. This frees up the storage that was used by the previous block, thus guaranteeing 
that the algorithm can proceed (the procedure extends easily to the case when the current 
block is the first one). 

This procedure allows then the algorithm to run until a block is built entirely due to 
(4.10). This situation represents an incurable breakdown, given the limits on storage, and 
forces the algorithm to stop. 
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6 . THE BLOCK ALGORITHM 


The block version of the algorithm differs from the sequential algorithm in that it generates 
the entire block before biorthogonalizing it against the previous block. This makes it more 
efficient on a parallel machine. For INBLOCK, we then have 

^n+l SnCn— Sn— l*7n— 1> 

^n+1 tnA W n tnCn — nii U’n — U^n— 1 • 


In addition, depending on the recursion used, one might want to monitor the norms of the 
vectors and scale when needed, to ensure numerical stability. We now want to biorthogo- 
nalize the new block against the previous vectors. Let j denote the index of vector vj in 

the block, vj = 0 M (A) v nk , fi = j — n k . We want 


[u>i w nk - tl .i w nk - tl 


w„ k -i 



We have 


[^1 Wnb-n - 1 W njk _ M ••• W nk -i] T Vj 

[^1 ’’’ ^n k —fi — 1 —fi W nk — i ] 

= [ 0 #i(A T )ti? 1 ••• © #1 (A T )iy n4 _ M _ 1 0 M (A T )tt7 njk _ M ••• Q tl (A T )w nk -i] T v nk 

= [0 0 * ••• *] T , 

where the last fi entries, denoted by *, are generally non-zero. On one hand, the regular 
polynomial for v nk is orthogonal to all polynomials of degree less than n k ; on the other 
hand, multiplying the polynomials for the previous vectors by Q u raises their degree by 
/i, thus raising the degree of the last fi polynomials to n* or more, thus introducing 
the non-zero entries. Furthermore, in biorthogonalizing vj against w nk - fl , one introduces 
components along the other vectors in Un^-^’s block, as v n *-/i is not biorthogonal to the 
vectors in its own block. Hence, to biorthogonalize Vj against the previous vectors, we 
need to biorthogonalize it against all the vectors in the blocks containing v ni - fl to u nt _i. 
Let rf denote the number of the block containing v nk - fl , 17 < k — 1. Then we have for 
NEXTXY 

V k = v k - V k -iM k -iV k - Vk- 2 M k - 2 V k Vr,M v V k , ( 6 . 1 ) 

W k = W k - - W k - 2 T k - 2 Stl 2 M k -,V k 

w,T,s; l M n v„, ( 6 . 2 ) 
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(6.3) 

(6.4) 


v k = V k S k \ 
w k = W k T~\ 


Vn+l - ^nAv n (-S„Cn-n k Vn + S n -irj„- nk V n -i), 
t^n+1 = tnA T W n (-t n (; n _ nk w n + < n _ 1 77 n _ n(k lD„_ 1 ), 

v n+l = Un+1 — V k -iM k -iV n +i — V k M k V n+ i, 

“’n+l = ™n+l — W k -\Tk-lSkliM k -lV n +i — W k T k S^ 1 M k V n+ i, 


^n+1 — v n+l/-S n +l, 
W n+1 = W n+1 /t n+1 . 


The terms in parentheses in (6.3) and (6.4) are not strictly necessary, since one then 
biorthogonalizes against these vectors, but they could enhance the numerical stability. If 
the size of the current block is at most the size of the previous block plus 1, then we have 

n *+i ~ n k < n k - n k _ 1 + 1 O- n k _ 1 < 2 n k - n k+1 + 1 & 
n k-i <n k - (n k+1 - n k - 1) n k _ 1 <n k - fi max & 

T) = k- 1 , 

which shows that under these conditions, the formulas (6.1) and (6.2) for V k and W k reduce 
to just two terms. Here fi max is the largest value of fi. Finally, we note that the products 

of the form MjV k that appear above have the structure 

(H 'JVjT'WfVt = (1 VjVj)-' 

since the regular vector is biorthogonal to all previous blocks. 

Computing j s trickier, since we modify the vectors used in the inner recursion. 

Nevertheless, retains the block tridiagonal structure (3.5). We will show this by 

induction. Assume that (3.3) and (3.5) hold for all n = n t - 1, / = 1, 2, . . . , k - 1. Thus 

AVi = Vi-\Pi + Vjd/ + V't+ 1 7/+i, / = 1,2, ... ,k — 1. (6.5) 


0 * 
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We need to show that (6.5) also holds for 1 = it. For V*, we have 


AV k =AV k S ’J 1 

= A (v k - V^M^Vk - V k - 2 M k - 2 V k V v M v V k ) S' 1 

= (v k a k + [0 ... 0 6 n+1 ]) S" 1 - AVk-iMk-iVkS^ 1 - • • • 

- AV v M v V k S^ 1 

= (VfcS* + n-iMfe-xVit + ■ ■ • + V v M v Vkj a k S^ 

+ [0 ••• 0 Un+lSn+1 + Vk-lMk-lVn+l + V k M k V n+ i ] S^ 1 

~ (Vk-20k-i + Vfc-idjfc-i + Vkikj M k -\ V k S k l 

- (v k - Z 0 k -2 + n_ 2 d t _ 2 + Vjfc-xTJb-i) M k - 2 V k S k l 

- (Vk-Jk-3 + Vk-3&k-3 + Vk- 2 %- 2 ) M k - 3 V k S k ' 

(ViA + %«v + V n+1 % +1 ) M.VkS, 1 

= -Vv~i(^M v V k S^)+--- 

+ ^*-2 (ac k M k - 2 — 0 k -iM k -i — a k - 2 M k - 2 — 7jfc_ 2 Affc_3^ V k S k l 
+ Vic — 1 ^M k -iV k a k + [0 ••• 0 M k -iv n+ i ] — a k -iM k -iV k 

— nfk-iM k - 2 V k ^j S k * 

+ Vjfc (s k a k + [0 ••• 0 M k v n+ i]-‘y k M k -iV k > j S^ 1 

+ ^n+l (Sn+l [0 ••• 0 Cj]^ 1 ), 

where e\ is the first column of the identity matrix. Suppose now we multiply the equation 
on the left by Then on the right hand side, only the V^_i term survives, as all 

the other blocks axe biorthogonal to W^-i by construction. In addition, on the left hand 
side, the term is also zero, again by biorthogonality. Hence, the coefficient of V^-i in the 
above expression is zero. Similarly, by multiplying on the left by W v , W v +i , ..., W k - 2 , 
one shows that in fact all the coefficients up to V k - 2 are zero. Hence, we are left with: 
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AVk — Vk~\ (^Mk-i VjtQjt + [0 ••• 0 Mk-iv n +i] — atk-iMk-iVk 

-7k- iMk-iVk'jS; 1 

+ Vk(s k &k + [0 ... o MkVn+il-jkMk-iVk'jSt 1 

+ V„+i(5 n+1 [0 ••• 0 e^S*" 1 ). 

The coefficient matrices 6c k, 0k, and -yjt+i are easily recognized. Note that the matrix a* 

which appears on the diagonal of H is a comrade matrix (from the d* and v„+i terms), but 
in addition, the first row fills in (from the 7k term). This is different from the sequential 
case, where the diagonal blocks are just comrade matrices. 

Finally, we would like to stress that — even if long recurrences (of more than two 
terms) occur in the updates (6.1) and (6.2) for Vj and W k — the matrices AVk , Vk > 

Vk - i (and similarly for the W matrices) are still connected via a three-term recursion, 
cf. (6.5). Both the sequential and the block algorithm generate upper Hessenberg matrices 

with the same block tridiagonal structure. This is important if the block algorithm is 
used for eigenvalue computations or in conjunction with the QMR approach (see Section 9) 
for solving linear systems. 
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7. CONCLUDING REMARKS 


We have presented the details of an implementation of the look-ahead Lanczos algorit hm 
for non-Hermitian matrices. Our implementation can handle look-ahead steps of any length 
and is not restricted to steps of length 2, as earlier implementations are. Also, the proposed 
algorithm requires roughly the same number of inner products as the standard Lanczos 
process without look-ahead. It was our intention to develop a robust algorithm which can 
be used in a black box solver. 

This paper is continued in Part II [6]. There, a robust black box solver for non- 
Hermitian linear system, the QMR algorithm based on the look-ahead Lanczos algorithm, 
is presented. Also, in [6], numerical experiments, both with the implementation of the 
look-ahead Lanczos method proposed here and with the QMR algorit hm , are reported. 
Finally, for the case of real nonsymmetric matrices A, the FORTRAN programs for these 
algorithms are listed in Part II. 

In the future, we plan to also provide FORTRAN codes for complex non-Hermitian 
matrices. Often when complex matrices arise in practical applications, they are complex 
symmetric. For this important special case, the look-ahead Lanczos algorit hm ran be 
arranged such that left and right Lanczos vectors coincide, and thus work and storage is 
halved. We also plan to provide FORTRAN codes for the resulting complex symmetric 
variant of the look-ahead Lanczos algorithm. 

In the present paper, we have only outlined a block version of the look-ahead Lanczos 
algorithm which appears to be better suited for parallel computers. Details of an actual 
implementation and experiments with it will be presented elsewhere. 
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